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ABSTRACT 

The cohort investigated by HIRAYAMA has a serve selection bias by age. Tihe 
effect of removing this selection bias by iterative proportional fitting a 
contingency table to given marginals is investigated. The risk increase re¬ 
ported by HIRAYAMA disappears completely when one removes selection bias by 
age. If the cases would have been observed as they occur in the female popu¬ 
lation one would have observed no risk increase. Only in the subgroup of wo¬ 
men married to industry workers there remains a risk increase, which might 
be due to confounding factors. Assuming modtest differentia! misclassification 
also leads to risk ratios around unity. 


INTRODUCTION 

The statistical association between enviromentali tobacco smoke and lung cancer 
is controversial. The HIRAYAMA study seems to provide sound epidemiological 
evidence supporting this hypothesis. In a recent paper UBERLA ($) has analysed 
the published studies. Regarding the HIRAYAMA study the following facts have to 
be kept in mind: 

The study was not designed to test the hypothesis, whether passive smoking 
is associated with lung cancer or not. It can therefore only generate this 
hypothesis, not prove it^ 

The cohort was not representative for the population of Japan. A selection 
bias is possible. 

The exposure indicator - the fact of being married to a man who smokes - 
is not reliable, not valid and not specific. 

The event indicator - dying on lung cancer as noted on death certificates - 
is neither reliable nor valid. 

Various confounding factors - for instance exposure at the working place, 
indoor air pollution, overall air pollution, type of medical care - were 
not accounted for.. 

Bias in registering the fact, that a woman is a nonsmoker, was not con¬ 
trolled. Resulting differential misclassif ieations of the cases, who were 
smokers and had to be excluded, have not been considered. 

Almost nothing is known about the 200 cases. No case reports are available, 
autopsy and histology are only available in 11.5 %. 
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The core of the information, on which! the results of this study rely, is 

1. ) that during 1965 200 women in Japan told an interviewer on • single 

occasion that they were - during that time - non-smokers and their 
husbands told, that they were smokers, which might have been different 
before and afterwards and 

2. ) that their death certificates subsequently contained the diagnosis lung 

cancer, which might have been erroneous. 

Such sparse information does not seem to be convincing. 

Ih our paper we consider four questions: 

1. ) What is the relative risk ..i.en one removes the selection bias regarding 

age of women in the HIRAYAMA cohort? 

2. ) What is the relative risk when one additionally accounts for the fact 

that women above 70 who are married to husbands still living are less 
frequent than reported in the population statistic? 

3. ) What is the relative risk for women, married to men with different occu¬ 

pations, when one removes the selection bias regarding age of men? 

4. ) What is the relative risk when additionally some modest differential 

misclassification is assumed? 

MATlERIAUS AND METHODS 

We start from tables 1, 2 and 3 of HIRAYAMA 1984 (4). These tables contain, the 
most detailed published data. In order to check our program we reproduced some 
of the reported relative risks with good accuracy. 

PERCENT FEMALE 


AGE 

GROUP 


JAPAN 

POPULATION 


HIRAYAMA 

COHORT 


TABLE 1: Differences between, the HIRAYAMA cohort and the female age distrii- 
bution over 40 in the population of JAPAN V965 (Population census 1965. Sta¬ 
tistical survey of the economy of Japan. 1967, Ministry of Foreign Affairs of 
J&pan). 

There are marked differences between the HIRAYAMA cohort and the female age 
distribution over 40 in the population, of Japan 1965. Women 50-59 are over¬ 
represented, women older than, 70 are severely underrepresented. In this age 
group only one percent was observed instead of 12 percent ih the population! 

The investigated cohort certainly, hr.s a severe selection bias by age, which 
needs no statistical! test. This is likely due to the fact, that the smoking 
behaviour was not known in the elderly or that the husbands of older, women, 
have died. Since it takes twenty years and more from exposure to lung cancer, 
older, women surely are relevant and should not be excluded. The majority of, 
lung cancer, cases occur in older age groups, in Germany more than 67 % in 
women over 65 years. 
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In order to answer the question what the relative risk is when the age selec¬ 
tion bias is removed; we adjusted the data to the age distribution of the fe¬ 
male population of Japan. The technique of iterative proportional fitting a 
contingency table to given marginals as described by BISHOP, FlENBERG and 
HOLLAND (1) or by HARTUNG (3) was used. This technique keeps the risks conr 
stant as observed in every cell! and changes the marginals and the cell counts 
according to the given age distribution of the population. Iterative propor¬ 
tional fitting of contingency tables to given marginals is a well known tech¬ 
nique in multivariate statistics and can be applied here without changing the 
observed interrelations between smoking habit, occupation and lung cancer. From 
the fitted or adjusted tables the risk ratios are caltulated in the usual way. 

Such risk ratios based on data with removed age selection bias are the correct 
ones andi should be used. One has to require that there should be no selection 
bias by age and the cases should be included as they would have occured in the 
population. Otherwise statistical tests and P-values are not very meaningful. 


WIVES HUSBAND'S SMOKING HABITS 

AGE NON 1 - 19 20 + TOTAL 


40-49 4 7918 21 17,492 2 ^ 12615 46 38025 

50-59 14 7635 46 15640 3li 8814 91 32089 

60-69 16 617,0 31 10381 10 3793 57 20344 

70 ♦: 3 172 1 671 2 239 6 1082 


TOTAL 37 21895 99> 44184 64 25461 200 9154Q 


TABLE 2; SMOKING HABIT OF HUSBAND BY AGE OF WIFE. ORlGINIAL DATA 
(Table 2 of HIRAYAMA 1984). 


Table 2 shows the original data by age of wife., The cells contain the number of 
lung cancer cases and those under risk as published by HIRAYAMA. The 1-19 group 
includes ex-smokers in this and the following tables. 200 cases out of 91540 
women were observed. Iterative proportional fitting to the female age distri¬ 
bution of the population leaves the underlined numbers constant. The others are 
adjusted using a rigth hand marginal which is made proportional to the age di¬ 
stribution of the population. 

RESULTS 

Table 3 gives the results of iterative proportional fitting to the female age 
distribution of the population;. It contains the numbers of those under risk 
and: of lung cancer deaths as they would have been ©vserved, if HIRAYAMA had 
not excluded or preferred certain age groups. The age selection bias is re¬ 
moved. The risks in the individual cells are still the same as those observed 
by HIRAYAMA. Alfeo the structure of the common distribution regarding age, smo¬ 
king habit and lung cancer is unchanged. HIRAYAMA would have totally observed 
232 cases instead of 200, with the corresponding numbers in the individual 
cells, had he included alt women as they live in the population. This table 
is the best available starting point for age-adjusted risk ratio calculations. 

It was not used so far. 
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WIVES 

AGE 


NON 1 

HUSBAND’S 

1 - 19 

SMOKING 

20 

HABITS 

♦ 

total 

..... i 

40-49 

3.91 

7784.8 

19.12 

15927.8 

20.02 

12024.0 

43.05 

35700.6 

50-59 

12.49 

6813.7 

38:20 

12987.1 

26.95 

766U2 

77.64 

27462.0 1 

60-69 

14.25 

5496.6 

25.70 

8604.9 

8.68 

3291.1 

48.63 

17392.6 ! 

70 4 

32.02 

1835.2 

9.93 

6664.2 

20.79 

2484.7 

62.74 

10984.8 1 

TOTAL 

62.67 

21895 

92.95 

44184 

76.44 

25461 

232.06 

91540 J 


TiABLE 3: SMOKING HABIT OF HUSBAND BY AGE OF WIFE (Table 2 
of HIRAYAMA 1984). Removed selection bias: Data adjusted 
to the age distribution of women tn the population. 


HUSBAND’S SMOKING HABITS 



NON 

1-19 

20 4 

A 

RR 

1.00 

1.37 

1.56 

IL 90 


1.00 

1.11 

MH-CHI 


1.51 

2.27 

P ONE TAILED 


.065 

.012* 

A 

RR 

1.00 

,77 

1.06 

,L 90 


.59 

.80 

MHi-CHII 


2.19 

.27 

P ONE TAILED 1 


.014* * 

.395 


'90 


UPPER PART ; STANDARDIZED BY AGE OF WOMEN ONLY 
LOWER PART : AGE SELECTION BIAS REMOVED AND STANDARDIZED 
BY AGE OF WOMEN! 

RR : Weighted point estimate of rate ratio 

Lower 90 percent confidence interval 
"significant" in positive direction 
•• : "Significant" in negative direction 

TABLE 4: RELATIVE RISK BY AGE OF WOMEN (Calculated from 
table 2 of HIRAYAMA 1984) 

Ih the upper part of table 4 one finds the risk ratios standardized by, ene 
only, as reported by HIRAYAMA. Tihe lower part contains the risK rctios af¬ 
ter removing the age selection bias. In the upper part the weighted point 
estimate of the rate ratio is 1.56 in the 20 ♦ group and ts technically 
"significant". ILg^ designates the lower point of the 90-percent confidence 
interval in this and the following tables, as it was used by HIRAYAMA. 
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The risk increase disappears completely when one removes the selection bias by 
age. Ih the 20 ♦ group the rate ratio is 1.06, hardly a relevant risk increase. 

In the group of 1-19 cigarettes per day it is .77 which, is a technically signi¬ 
ficant risk decrease. The adjusted rate ratio, considering all those exposed 
in one group versus those not exposed is .61 with, a confidence interval inclu¬ 
ding unity. If HIRAYAMA had observed the cases as they occur in the population 
without selection bias by age, he would have observed no risk increase, but a 
sligth, and meaningless risk decrease. This is the main result of our, reanalysis, 
which corresponds welll with the result of the prospective American cohort study 
as published by GARFINK&L (2). 

In the discussion following our paper in TOKIO last november HIRAYAMA noted, 
that in the population the percentage of women over 70 married to men who are 
still alive is smaller than the percentage of women reported in the population • 
statistics. Since we do not have the numbers we assume that only half of the 
women over 70 reported in the population census 1965 have been married to li¬ 
ving husbands. The resulting' hypothetical population together, with the HIRAYAMA 
cohort is presented in table 5. 


AGE 

GROUP 


PERCENT FEMALE 
HYPOTHETIC 
POPULATION 


HIRAYAMA 

COHORT 


TABLE 5: DIFFERENCES BETWEEN THE HIRAYAMA COHORT AND A HYPO¬ 
THETIC FEMALE AGE DISTRIBUTION OVER 40. (Explanation see text) 

There is still possibly a selection bias in table 5. Now 6 percent! of women 
over 70 1 would have been included in the hypothetic female distribution in¬ 
stead of 12 percent. The corresponding lung cancer cass, which generally are 
more frequent in this age group than in younger, women, had been excluded. The 
reduction to one half accounts for the argument of HIRAYAMA mentioned! above 
sufficiently. The resulting relative risks are presented in table 6. Even with 
these assumptions the relative risk is only 1.03 in the group of women married 
to husbands smoking 1^*19 cigarettes per day, 1.29 ini the 20 ♦ group and 1.12 
if one considers the smoking group altogether. Air these risk ratios are not 
statistically different fromi unity; 


HUSBAND'S SMOKING HABITS 



NON 

1-19 

20 + 

SMOKER 

/N 

RR 

1.00 

1.03 

1.29 

1.12 

,L W 


.77 

.94 

.85 

MH-CHI 


.05 

1.33 

.47 


TABLE 6: RELATIVE RISK BY AGE OF WOMEN, AGE SELECTION BIAS REMOVED 
AND HYPOTHETIC/ ADJUSTED TO HIRAYAMAS ARGUMENT 
(see table 5) 


Source: https://www.industrydocuments.ucsf.edu/docs/gxvj0000 




Since it is impossible for us to reconstruct the real situation some twenty 
years ago in Japan regarding the conditional distributions of males and fe¬ 
males regarding age, smoking and family status, the reported results of the 
HIRAYAMA study can not be conclusive to us. As long as the selection bias by 
age can not be explained numerically in a sufficient way by HIRAYAMA, his 
thesis, that there is a significant and relevant risk increase based on. his 
data migth as well be wrong. 

We now consider two occupations, farmers and industry workers. From the upper 
part of table 7 one can, see that the relative risk for wives of farmers seems 

substantial, when one standardizes by age of men only. The point estimates of 

the rate ratios are 1.40 and 1.63 respectively. This was observed earlier and 
had no adequate exlanation. If one removes the selection bias by age and: ad¬ 
justs to the male age distribution of Japan - the numbers in the lower part 

of table 7 - the rate ratios are .85 and .82, not different from unity. This 

seems more plausible. 



HUSBAND'S SMOKING 
NON 1-19 

HABITS 

20 ♦ 

A 



RR 

1.00 JJ18 

1.63 

So 

.97 

1u01 

MH-CH1 

1.48 

1.92 

P ONE TAILED 

.069 

.027 

A 



RR 

1,00: J3S 

.82 

So 

,59 

.53 

MH-CHI 

.42 

.53 

P ONE TAILED 

.337 

.296 


UPPER PART : STANDARDIZED BY AGE OF MEN ONLY 

LOWER PART : AGE SELECTION BIAS REMOVED AND STANDARDIZED 

BY AGE OF MEN 

TABLE 7: RELATIVE RISKS: WIVES OF FARMERS ONLY 
(Table 3 of HIRAYAMA 1984) 

Considering the wives of industry workers only, in the upper part of table 8 
the point estimates of the rate ratios are 1.77 and 2.27, standardized by age 
of men, being not significant. Removing the age selection bias - in the lo¬ 
wer part of table 8 - there is a remarkable risk increase to 4.60 and 6.90; 
which is significant. However, there are only 9 lung cancer deaths in the 20 + 
group and only 3 in women 70 years and older, which are small numbers, but 
these are numbers observed and used by HIRAYAMA and his risk structure is 
unchanged. Thus only in the subgroup of women married to industry workers there 
is a risk increase, in all other occupations there is no risk increase. Omitting 
industry workers, the point estimates of the rate ratios are .90 and .89, 
not significantly different from unity. These findings are consistent with the 
assumption Of confounding factors in, women married to industry workers, who 
migth be exposed to other enviromenta! hazards. Our calculations show that by 
removing age selection bias by age, one can explain hitherto implausible re¬ 
sults. 
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HUSBAND’S SMOKING HABITlS 
NON 1-19 20 


A 

RR 

,L 90 

MH-CHI 

P ONE TAILED 

1.00 

1.77 

.70' 

.73 

.232 

2.27 

.84 

.81 

.208 

A 




RR, 

1.00 

4.60 

6.90 

lL 90 


1.71 

2.45 

MH-CHI 


2.50 

2.78 

P ONE TAILED 


.006 

.003 


UPPER PART : STANDARDIZED BY AGE OP MEN ONLY 
LOWER PART : AGE SELECTION BIAS REMOVED AND STANDARDIZED 
BY AGE OF MEN 

TABLE 8: RELATIVE RISKS: WIVES OF INDUSTRY WORKERS ONLY 
(Table 3 of HIRAYAMA 1984) 


Active smoking is correlated among married coupltes. Ir. a society in which fe¬ 
male smokers were very rare in 1965, more women married to smokers will declare 
themselves nonsmokers than the other way round. One has therefore to consider 
biased or differential misclassif ication. There are likely more women with lung 
cancer, who have been misclassified as nonsmokers than the other way round. They 
have to be removed from the cohort. We made some moderate assumptions regarding 
misciassification, as shown in table 9. In order, to examine, how sensitive the 
relative risk is we removed 10, 20 and 30 cases from, the exposed groups corres¬ 
ponding to 5, 10 and 15 percent. Assuming 30 misclassified cases - 15 percent, 
a percentage which, has been observed ih the literature (5) - the rate ratios 
are ,66 and .85i In the group 1-19 cigarettes per day ell the risk estimators 
are significantly smaller than unity. Our personal opinion is that 10 differen¬ 
tial misclassified cases from 200 is a fair number. The corresponding weigthed 
point estimates of the rate ratio are .74 and 1.00. These risk estimates are 
as reasonable as other risk estimates calculated from, the HIRAYAMA data. They 
indicate - if anything - a risk decrease, not a risk increase. 
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NUMBER OF CASES ASSUMED HUSBAND’S SMOKING HABITS 

MISCLASSlFlED AND REMOVED 

FROM EXPOSED GROUPS NON 1-19 20 ♦ 


n * 10 * 

5 % 

A 

RR 

1.00 

.74 

1.00 



P ONE TAILED 


.006 

.469 

n * 20 * 

10 % 

^R 

1.00 

.70 

.93 



P ONE TAILED 


.003 

.383 

n * 30 = 

15 % 

4 R 

1.00 

.66 

JB5 



P ONE TAILED 


.001 

.238 


TABLE 9: RELATIVE RISK: ASSUMED DIFFERENTIAL MISCLASSlFICATlON 
(Age selection bias removed and standardized by age of women) 


DISCUSSION 


Reanaiyses of data, which have been collected by others are not easy. This is 
because information is not completely available, because information, might be 
misinterpreted or because one has to take another view in ordfer to come closer 
to the acceptable truth. Our, calculations do not diminish the great value and 
impact the HIRAYAMA study had on the epidemiology of passive smoking^ They 
show however, that reasonable alternative views on the same data are possible, 
which lead to opposite conclusions. Our, findings are in, contrast to HlRAYAMA’s 
thesis that - based on his data - there is a substantial! statistical asscsiation, 
between passive smoking and Ibng cancer. We do not hold that our view is 
the only correct one. We do hold however, that the risk ratios calculated by 
us, removing age selection bias, are as reasonable as the ones calculated by 
HIRAYAMA. Since they go back to the population and not to a selected sample 
our estimates could be preferable. Hypothetically accounting for, the argument 
of HIRAYAMA, that in the population, the percentage of women over, 70 married 
to men, who are still alive is smaller than the percentage of women reported 
in the population statistics does not change our results. Our risk estimates 
•re a consequence of the data published by HIRAYAMA and can not be rejected 
from the study data, as they ar,e published so far. 
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AGE SELECTION BIAS 

A 

RR 

1.00 

J7 

REMOVED AND AGE- 
STANDARDIZED (WOMEN) 

P ONE 

TAILED 1 

.014 

without industry 

A 

RR 

1.00 

.90 

WORKERS, AGE SELECTION 
BIAS REMOVED AND AGE- 

P ONE 

TAILED 

.394 

STANDARDIZED (MEN) 

10 CASES ASSUMED 

A 

RR 

1.00 

.74 

M1SCLASS1F1ED, AGE 
SELECTION BIAS 

P ON£ 

TAILED 

.006 


REMOVED AND AGE- 
STANDARDlZED (WOMEN) 


I 1 O 6 

.395 


.89 


.179 


1.00 

.469 



TABLE 10: REANAUYSIS OF HIRAYAMAS DATA; SUMMARY OF RELATIVE RISKS 


To summarize: Removing the age selection bias in the KIRA YAM A study one gets a 
relative risk of 1.06 <n the group of women married to men with more than 20 
cigarettes per day. In the group of women married to men with 1-19 cigarettes 
per day the relative risk is .77, a technically "significant” risk decrease. If 
HIRAYAMA could have Observed the lung cancer cases as they occur in the female 
population, he would: have observed no risk increase, but a risk decrease to 
around .81, not significantly different from unity, considering those exposed 
versus those not exposed. 

i If one omits the wives married to industry workers because of possible confoun¬ 

ding factors the re*atv*e risk is .90 and .89 respectively. This is of the same 
sire order and smaller than unity. Here we could adjust and standardize by occu¬ 
pation and age of men only, which is not as appropriate as by age of women. 



t 





If one assumes that 10 cases are differentially misclassified and removes them 
from the exposed groups, the risk estimates are .74 and 1.00 respectively. 

Our findings demonstrate how sensitive the data of this study are and how week 
the evidence for a statistical association between passive smoking and lung can¬ 
cer from this study is. In view of these and other facts, which we mentioned in 
the introduction, the null hypothesis might be true as well and is consistent 
with the HIRAYAMA data in the same way as the alternative hypothesis. 

We would be glad to apply our technique to more detailed data if we can get 
them from HIRAYAMA, for instance in order to adjust by occupation of men, and 
age of women, or by occupation of men and by age of women married to a hus¬ 
band who is still alive. We are ready to modify our view if such data can sup¬ 
port the alternative hypothesis better than the published data do. We do hope, 
that our calculations give rise to a fruitful discussion. The methods we used 
here might be of interest to the analysis of other cohort and control studies. 
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